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Abstract 

We study the evolution of cooperation among selfish individuals in the stochastic strategy spatial prisoner's 
dilemma game. We equip players with the particle swarm optimization technique, and find that it may 
lead to highly cooperative states even if the temptations to defect are strong. The concept of particle 
swarm optimization was originally introduced within a simple model of social dynamics that can describe the 
formation of a swarm, i.e., analogous to a swarm of bees searching for a food source. Essentially, particle 
swarm optimization foresees changes in the velocity profile of each player, such that the best locations are 
targeted and eventually occupied. In our case, each player keeps track of the highest payoff attained within 
a local topological neighborhood and its individual highest payoff. Thus, players make use of their own 
memory that keeps score of the most profitable strategy in previous actions, as well as use of the knowledge 
gained by the swarm as a whole, to find the best available strategy for themselves and the society. Following 
extensive simulations of this setup, we find a significant increase in the level of cooperation for a wide range 
of parameters, and also a full resolution of the prisoner's dilemma. We also demonstrate extreme efficiency of 
the optimization algorithm when dealing with environments that strongly favor the proliferation of defection, 
which in turn suggests that swarming could be an important phenomenon by means of which cooperation can 
be sustained even under highly unfavorable conditions. We thus present an alternative way of understanding 
the evolution of cooperative behavior and its ubiquitous presence in nature, and we hope that this study will 
be inspirational for future efforts aimed in this direction. 

Introduction 

Cooperation is the basis for complex organizational structures in biological as well as social systems. Nev- 
ertheless, understanding the emergence and stability of cooperative behavior in the context of Darwinian 
selection remains a challenge to date. The dilemmas of cooperation are usually tackled within the framework 
of evolutionary game theory [lH3]. Although several mechanism allowing for the evolution of cooperation have 
already been identified [4], the resolution of social dilemmas and the closely related avoidance of the "tragedy 
of the commons" [5j is still considered an open problem. The prisoner's dilemma game |6J, in particular, 
has attracted considerable attention in the past three decades [TMlO]. and to date it is widely consider as a 
paradigmatic example for the tensions between social welfare and individual interests [11H33] . Cooperation 
and defection are the two strategies that are at the heart of the prisoner's dilemma game. In general, while 
cooperators sacrifice some of their personal fitness for the benefit of the society, defectors succumb to the 
temptations and take full advantage of them. The prisoner's dilemma captures this situation by means of the 
following payoffs: mutual cooperation yields the reward R, mutual defection leads to punishment P, and the 
mixed choice gives the cooperator the sucker's payoff 5* and the defector the temptation T. The payoff ranking 
thus satisfies T > R > P > S. In the iterated prisoner's dilemma game the assumption that the mutual 
cooperation yields the highest collective income imposes another constraint, namely 2R > T + S. This makes 
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it clear that the rational (selfish) action is to defect, and according to the fundamental principles of Darwinian 
selection, cooperation extinction is inevitable. Full defection is indeed the only stable Nash equilibrium for the 
prisoner's dilemma game in well-mixed populations. 

Since the seminal paper by Nowak and May [34], however, we know that this may not be the case for spatial 
interactions. Although not universally applicable [35], spatial reciprocity is recognized as a potent promoter of 
cooperative behavior, even more so on complex networks [36-40J (for a comprehensive review see [8J). Other 
prominent mechanism promoting cooperation are kin selection |41], direct and indirect reciprocity [42U44| . as 
well as group selection [451447]. to name but a few. 

Inspired by previous works on this subject, we here introduce particle swarm optimization |48f[50] to the 
players engaging in the prisoner's dilemma game on a square lattice [51], with the aim of investigating its 
impact on the evolution of cooperation. However, we abandon the commonly considered assumption that the 
players can choose only between the two pure strategies, namely to either cooperate or to defect. Real-life 
situations are often more complex than that, and indeed there is a lot of gray between the black and white 
extremes. Motivated by this fact, we here consider stochastic strategies, such that the cooperativeness of 
each players is determined by G [0, 1]. W ^ 1 returns full cooperation, while W = returns full defection. 
These are the two extremes recovered from our present setup. Between < T4^ < 1, however, there exists 
a continuous set of strategies that can be considered either as predominantly cooperative (if W > 0.5) or 
predominantly defective (if W < 0.5). Moreover, while the evolution of strategies is traditionally performed 
by means of different strategy adoption (or updating) rules (see [8] for a comprehensive review), we here take 
a much less explored avenue, namely by considering the aforementioned particle swarm optimization as the 
driving force behind strategy evolution. The particle swarm optimization algorithm is based on a simplified 
social model that is tightly tied to the theory of swarming |48H50] . A traditional analogy is a swarm of bees 
searching for a food source. In this analogy, each bee (considered here as a particle) makes use of its own 
memory as well as the knowledge obtained by the swarm as a whole, to find the best available food source. 
Particle swarm optimization can also be considered as being representative for multidimensional search (for 
example to find an optimum of a utility function). Typically, a number of simple entities (the "particles") is 
randomly positioned in the search space, and to each a velocity vector is assigned, which is subsequently used 
to update the current position of each particle in the swarm. Each particle then proceeds by evaluating the 
objective function at its current location, and finally to determining its movement through the search space 
by combining some aspects of the history of its own current as well as other potentially optimal locations with 
those of one or more members of the swarm. Thus, the process makes use of the memory of each particle, as 
well as the knowledge gained by the swarm as a whole. The next iteration takes place after all the particles 
have moved once. Eventually the swarm, like a flock of birds collectively foraging for food, is likely to move 
closer to an optimum of the utility function. Accordingly, the particles (bees, birds, players) therefore should 
have a tendency to fly towards better and better areas over the course of the search process. 

Here we focus specifically on introducing the particle swarm optimization algorithm to the strategy updating 
process in the stochastic strategy prisoner's dilemma game on the square lattice. In agreement with the above 
described general concept, each individual is assigned a variable from the unit interval determining its level of 
cooperativeness (or willingness to cooperate). Likewise, a velocity vector is assigned to every player. Following 
this initialization, each player makes use of its own memory (i.e., keeping score of the most profitable individual 
strategy in the past), as well as use of the knowledge gained by the swarm (i.e., the nearest neighbors) as a 
whole, to find the best available strategy for itself and the society. In particular, the particle swarm optimization 
algorithm makes use of the velocity vector to update the current strategy of each player in the swarm. In 
this sense our study can be considered related to previous works investigating the effects of mobility on the 
evolution of cooperation [52:-57], although it relies on an essentially different algorithm. The outline of the 
latter is as follows: 1) Start with a set of strategies (i.e., cooperation probabilities W) that are initially 
uniformly distributed in the [0,1] interval. 2) Calculate a velocity vector for each strategy in the swarm. 3) 
Update the strategy of each agent, using its previous value and the updated velocity vector. 4) Go to step 2 
and repeat until convergence. All the details of this setup are described in the Methods section, while here we 
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proceed with presenting the main results. 

Results 

We start by presenting the average level of cooperation, defined as A^^^ ^ - W{i) where N is the system size 
and i runs over all the players in the population, in dependence on the temptation to defect b for different 
values of u! (for the definition see the Methods section) in Fig.[T] Expectedly, the average level of cooperation 
decreases as b increases for all oj. However, while for a; ^ the cooperative behavior dies out completely 
at high values of b, for oj ^ 1 the average level of cooperation hovers comfortably over 1/3, even when the 
maximal b = 2 limit is reached. For intermediate and low values of b, however, small values of uj may yield 
overall higher average levels of cooperation. It is thus intriguing to find that the introduced particle swarm 
optimization in the strategy updating, fine-tuned by means of the parameter oj, can be responsible for the 
emergence of cooperative behavior across the whole span of defection temptation values, as well as for its 
dominance at low values of h. More precisely, two regimes can be differentiated. For b < 1.5 intermediate and 
high values of oj are actually detrimental for the evolution of cooperation, while for b > 1.5 the higher the oj 
the higher the stationary level of cooperative behavior. These results make it clear that low oj (e.g., uj ~ 0.01) 
strongly support the cooperation level for small b, up to 6 ~ 1.2, whereas high uj are much better suited for 
cooperation to evolve under this dynamics in strongly defection-prone environments. At this point we argue 
that for a; — > 1, when players imitate their best past actions rather than the best players in the swarm (see 
Methods for details), the proposed strategy updating rule warrants the most significant benefits to cooperative 
behavior if looking at the entire range of b values, thus in turn resolving the prisoner's dilemma. 

In order to obtain an understanding of these results, we first systematically analyze the impact of uj on 
the final distribution of strategies in the whole population for various values of 6, as depicted in Fig. [2l Note 
that for UJ = 0.01 the distribution of strategies is very monotonous, while for oj = 0.99 much more diversity 
is inferable. Both observations are virtually independent of b. Since the parameter oj e [0, 1] determines 
the tendency of every player to either adopt the most profitable strategy in its past actions {oj 1) or the 
strategy of the most successful player in its neighborhood [ut — ^ 0), these results can be understood very 
well. In particular, for oj = 0.01 individuals are strongly inclined to imitate the best-performing strategies in 
the swarm, irrespective of their personal experience in the past. This narrow-sightedness inevitably results in 
strongly polarized distributions, as only either pure cooperators or pure defectors are the ones most likely to 
have the overall highest payoffs. Note that this is because the payoffs are directly scaled by W (see Methods). 
Conversely, for oj = 0.99 the situation is very different since players will focus on their own past actions and 
learn from them in order to arrive at the best possible strategy. This has the advantage that, unlike for 
u! = 0.01, here only the immediate neighborhood is explicitly taken into account. For high values of b local 
considerations are obviously much more important than for low values of b. In the latter case, the nearest 
neighbors can much easily be neglected since the environment on its own is not strongly favorable for defectors, 
and hence cooperators can prevail even if overlooking the detailed distribution of strategies in their immediate 
neighborhood. An additional advantage of small ut, however, is that by focusing only (or predominantly) 
on the best-performing players in the swarm, the average level of cooperativeness can be maximized more 
efficiently (as evidenced by results presented in Fig.[T]). But if the temptation to defect is strong the strictly 
local considerations are much more important, as proper adaptation is then crucial for cooperators to survive. 
Accordingly, for high values of b higher ui yield better results (higher average level of cooperation) by exploiting 
effectively the whole array of available strategies to respond properly {locally properly) to invading defectors. 
At low values of b, however, these locally optimal adaptations (warranted by cj — > 1) might be less effective 
than the more globally inspired actions (warranted by cj — > 0). 

These conclusions can be corroborated further by examining characteristic snapshots of strategy and veloc- 
ity distributions for key combinations of b and ut, as presented in Figs.[3]and|4l Focusing first on the distribution 
of strategies in Fig. [S] it can be inferred that for oj = 0.01, where only the most successful strategies within 
the whole swarm can spread rapidly due to the workings of the particle swarm optimization algorithm, the 



Resolution of the prisoner's dilemma by means of particle swarm optimization 



4 



strategy distribution becomes very monotonous, leading to the isolation of homogeneous groups of players 
characterized either by = or = 1, respectively. This holds irrespective of b, only that for strong 
temptations to defect the clusters of strongly cooperative players become rarer. Note that in this parameter 
region the here studied stochastic strategy prisoner's dilemma game actually becomes strikingly similar to the 
classical two-strategy spatial prisoner's dilemma game [34ll51| , where the clustering of cooperators is the main 
driving force prohibiting the full dominance of defectors. Conversely, for lu = 0.99, where the particle swarm 
optimization algorithm is driven by the past experience of every individual player (rather than the swarm as a 
whole), highly heterogeneous kaleidoscopes appear, and it is indeed this diversity that warrants a high level of 
cooperativeness even by strong temptations to defect. In particular, snapshots in the bottom panel of Fig. [3] 
indicate that many clusters consist of a small amount of players with a high cooperation level (i.e., W close to 
1), surrounded by players with comparatively lower W values. This in turn implies that not the clustering itself 
is crucial for the sustenance of cooperation, but actually the aggregation of such clusters itself, which enables 
the players with higher cooperation level to survive the evolutionary process. Note that the high cooperation 
level within clusters provides surrounding individuals with a safe source of benefits that are sufficient to resist 
the invasion of predominantly defective (i.e., W close to 0) players. The particle swarm optimization algorithm 
thus spontaneously generates the diversity needed for cooperation to survive at high b, much by means of the 
same mechanism that was reported previously for manually introduced heterogeneous states [58]. Of course, 
players located in the interior of such clusters enjoy the benefits of mutual cooperation and are therefore able 
to survive despite the constant exploitation by defectors, yet this positive effect is additionally amplified by the 
diversity and the hierarchical local structures that give additional strength to the cooperative strategy, while 
at the same time provide no benefits for defectors. 

Moreover, by examining the characteristic distributions of velocities presented in Fig. HI we can obtain 
further insight with regards to the evolution of the strategies and their adaptation. Note that by means of 
Eqs. (1) and (2) (see the Methods section), the two quantities are strongly interdependent. For oj = 0.01, 
even though the snapshots are taken in the stationary state (where the average level of cooperation is stable), 
the majority of players will have the velocity very different from (although on average over time and space it 
is virtually zero, thus assuring the stationary state being reached). This indicates that players will constantly 
try to reach the currently maximal payoff in the swarm, despite the fact that for the majority this will be 
unattainable. The locally high velocity values also indicate that the evolutionary process at low values of ut is 
quite violent and fast, with the population therefore unable to cope with high temptations to defect. Conversely, 
for u = 0.99 the situation is very different. Here the majority of players will adapt their strategy very slowly to 
the changing local influences, which yields the velocity profile for every player being very close to zero. These 
conclusions are valid practically irrespective of b for the two considered values of w, but the average level of 
cooperation is in fact very much different. While individually optimal past strategies in the particle swarm 
optimization algorithm yield a slow but stable and very effective response even to severe defector attacks, 
population-wide (or swarm-wide) pursuit for extraordinary benefits proves insufficiently effective to sustain 
cooperative behavior at high b values. The latter approach, however, may be superior at low temptations 
to defect, where local considerations are not so vital, and where the pursuit of individual benefits can be 
successful even if driven by globally-inspired fast and bold actions. 

Summary 

In sum, we have studied the impact of particle swarm optimization on the evolution of cooperation in the 
stochastic strategy spatial prisoner's dilemma game. The strategy updating was guided by the particle swarm 
optimization algorithm, using as input the individual memory of every player (i.e., keeping score of the most 
profitable individual strategy in the past) as well as the knowledge gained by the swarm (i.e., the nearest 
neighbors) as a whole. By means of extensive simulations, we found that cooperative behavior can prevail in 
large regions of the parameter space defining the stochastic strategy prisoner's dilemma game, thus effectively 
leading to the resolution of the dilemma in favor of pro-social behavior. In particular, we have demonstrated that 
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imitating the most profitable strategy in the swarm may lead to full dominance of cooperation at moderate 
temptations to defect, while imitating the best individual actions in the past may lead to the survival of 
cooperative behavior even if the environment is strongly prone to defection. We have also investigated the 
actual strategy configurations in the population as well as pertaining spatial distributions of strategies and 
velocities, for which we have found to be closely tied to the setup of the particle swarm optimization algorithm, 
and in fact instrumental for the understanding of the observed promotion of the evolution of cooperation. We 
hope that our work will offer new ways of ensuring cooperation in situations constituting a social dilemma, 
and that it will be an inspiration for future research when considering the very interesting combination of 
intelligent algorithms and evolutionary games. 

Methods 

We consider an evolutionary stochastic strategy prisoner's dilemma game on a square lattice, consisting of 
100 X 100 players with nearest-neighbor interactions and periodic boundary conditions. Initially the strategies 
of all players are drawn randomly from uniformly distributed values of W in the [0, 1] interval, whereby W 
determines the cooperativeness of each individual (or the willingness to cooperate). While W = 1 returns full 
cooperation and W = returns full defection, between < Vt^ < 1 there exists a continuous set of strategies 
that can be considered either as being predominantly cooperative (if W > 0.5) or predominantly defective (if 
W < 0.5), hence constituting a stochastic strategy version of the prisoner's dilemma game. 

Players interact pairwise with all their nearest neighbors, thereby receiving payoffs that can be summarized 
succinctly by the rescaled payoff matrix 

C D 
C f W{i) * W{j) \ 

D y b^W{j) * (1 - W{i)) J 

where W{i) and W{j) define the level of cooperativeness of players i and j, respectively. This setup entails b 
as the only free parameter determining the temptation to defect, but it is well-known that the essence of the 
prisoner's dilemma game is thereby left intact |34] . 

The stochastic strategy prisoner's dilemma game is iterated forward in time using a synchronous Monte 
Carlo updating scheme. First, each player accumulates its payoff by playing the game with all four of its 
nearest neighbors. Subsequently, players have to decide what strategy they will adopt in the next round (i.e., 
what will their new W{i) be), which we here determine by means of the particle swarm optimization algorithm. 
Its implementation is simple and intuitive, as follows. Initially, at time step n = 0, all players are assigned the 
same velocity Vi,n = 0. For each following n, the velocity vector „ of every player i is updated according to 

V^,n+i = V,,n + uj[Wii, h) - W{i, n)] + (1 - u)[W{^, n) ~ W{i, n)], (1) 

and the strategy follows directly as 

Wii,n+l) = Wii,n) + V^,n+u (2) 

where in Eq. (1) W{i,h) is the most profitable strategy of player i in all its past actions, whereas W{-k,n) is 
the best performing strategy in the swarm (here considered to be composed of the four nearest neighbors). 
The parameter cj e [0, 1] determines the tendency of every player to either adopt the most profitable strategy 
in its past actions or the current strategy of the most successful player within the swarm. In particular, u = 1 
implies that the player will definitely imitate its past best action, i.e., the strategy that in the past yielded 
the highest payoff. On the other hand, uj = implies that the player will copy the strategy of the currently 
best performing player in its neighborhood. Intermediate values of ui interpolate linearly between these two 
extremes. Besides the temptation to defect b, uj is here considered as the second crucial system parameter. 
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Figure 1. Average level of cooperation in dependence on b for different values of lo. It can 

be observed that while imitating the best performing player in the swarm [uj — > 0) might be beneficial 
at low temptations to defect, imitating personal success (w — > 1) is definitively better for the evolution 
of cooperation in strongly defection-prone environments. Each data point is an average of the final 
outcome (stationary state) of the game over 100 independent realizations. Lines connecting the symbols 
are just to guide the eye. 
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Figure 2. Distribution of strategies in the whole population, as obtained for different 
combinations of b and oj. It can be observed that for lu = 0.01 the nature of the stochastic strategy 
prisoner's dilemma game is essentially completely overridden by the selfish drive of players to reach the 
highest current payoffs in the swarm, in turn virtually completely transforming the game to its 
two-strategy [only W ^ (full defection) or W = I (full cooperation) strategies are present in the 
population] version. Conversely, for ui = 0.99 the full spectrum of available strategies is exploited to 
arrive at the final stationary state. Note that the horizontal axis displays the willingness to cooperate 
W (defining the strategy of every player), while the vertical axis depicts the probability that this 
strategy is present in the population. Depicted results are averages of the final outcome (stationary 
state) over 100 independent realizations. 
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Figure 3. Characteristic spatial distributions of strategies, as obtained for different 
combinations of b and oj. As concluded from results depicted in Fig. [2l for low values of lu only the 
two "extreme" strategies (with rare exceptions) are adopted, while for high values of uj the whole array 
of available strategies comes into play. Moreover, it is interesting to observe that values of w — yield 
the well-known clustering of cooperators [31] on the square lattice, while the snapshots for uj = 0.99 
seem to have these feature somewhat less pronounced, although still clearly inferable (note that the 
distinction of clusters is somewhat difficult due to the continuous array of possible strategies) . This 
suggests that, besides the clustering of cooperators, additional mechanisms may underlie the survival of 
cooperators at high temptations to defect and w — >■ 1 within the present setup. The color encoding, as 
depicted right, indicates the values of W for each individual player. 
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Figure 4. Characteristic spatial distributions of velocities, as obtained for different 
combinations of b and oj. Top row depicts results for oj ~ 0.01, while bottom row features results for 
uj = 0.99. Irrespective of b, it can be observed that for oj ~ 0.99 the whole population essentially 
becomes a swarm in that the velocities of all players are much the same and close to zero. The fact that 
the prevailing velocity is close to zero simply reflects that the stationary state has been reached by 
means of adaptive, locally-inspired and slow strategy changes (which are, however, very effective even if 
the temptations to defect are strong). For w = 0.01, however, only isolated clusters can be considered to 
act as swarms, while the majority of players cannot be associated with any kind of group dynamics and 
is simply caught in the futile pursuit for the highest, yet for the majority unattainable, payoffs. These 
results indicate that swarming is an important agonist that promotes cooperation at high temptations 
to defect (see results presented in Fig. [T]). The color encoding, as depicted right, indicates the values of 
Vi^n for each individual player, where n was chosen sufficiently large such that the stationary state of 
the game has been reached. Importantly, we note that for uj = 0.01 the stationary state has in fact been 
reached, although at a given instance in time the average velocity in the population might be different 
from zero. 



